Grounded Knowledge Bases for Scientific Domains

نویسندگان

  • Dana Movshovitz-Attias
  • Tom Mitchell
  • Roni Rosenfeld
  • Alon Halevy
چکیده

This thesis is focused on building knowledge bases (KBs) for scientific domains. Specifically, we create structured representations of technical-domain information using unsupervised or semi-supervised learning methods. This work is inspired by recent advances in knowledge base construction based on Web text. However, in the technical domains we consider here, in addition to text corpora we have access to the objects named by text entities, as well as data associated with those objects. For example, in the software domain, we consider the implementation of classes in code repositories, and observe the way they are being used in programs. In the biomedical realm, biological ontologies define interactions and relations between domain entities, and there is experimental information on entities such as proteins and genes. We consider the process of grounding, namely, linking entity mentions from text to external domain resources, including code repositories and biomedical ontologies, where objects can be uniquely identified. Grounding presents an opportunity for learning, not only how entities are discussed in text, but also what are their real-world properties. The main contribution of this thesis is in addressing challenges from the following research areas, in the context of learning about technical domains: (1) Knowledge representation: How should knowledge about technical domains be represented and used? (2) Grounding: How can existing resources of technical domains be used in learning? (3) Applications: What applications can benefit from structured knowledge bases dedicated to scientific data? We explore grounded learning and knowledge base construction for the biomedical and software domains. We first discuss approaches for improving applications based on well-studied statistical language models. Next, we construct a deeper semantic representation of domain-entities by building a grounded ontology, where entities are linked to a code repository, and through an adaption of an ontology-driven KB learner to scientific input. Finally, we present a topic model framework for knowledge base construction, which jointly optimizes the KB schema and learned facts, and show that this framework produces high precision KBs in our two domains of interest. We discuss extensions to our model that allow: first, incorporating human input, leading to a semi-supervised learning process, and second, grounding the modeled entities with domain data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grounded Knowledge Bases for Scientific Domains Dana

This thesis is focused on building knowledge bases (KBs) for scientific domains. Specifically, we create structured representations of technical-domain information using unsupervised or semi supervised learning methods. This work is inspired by recent advances in knowledge base construction based on Web text. However, in the technical domains we consider here, we have grounded data about the ob...

متن کامل

Consistency of scientific knowledge bases

The Plinius-project at the University of Twente is aimed at constructing large-scale knowledge bases in scientific and engineering domains, through computer analysis of natural-language texts. One of the many problems in this endeavor is caused by inconsistencies in the scientific literature. In this short report (of which a fuller version is available [de Jong, 1992]) we describe our approach ...

متن کامل

The patterns and behaviors of researchers’ knowledge sharing in scientific social networks:A Case Study of Research Gate’ Question And Answer System

Aim: Scientific social networks were shaped as part of a set of social software and a platform for international interactions sharing the tangible and intangible knowledge of researchers. The purpose is to investigate the patterns and behaviors of knowledge sharing of researchers in Research Gate. Based on this, the question and answer system of this scientific social network was analyzed and r...

متن کامل

Text Knowledge Engineeringby Qualitative

We propose a methodology for enhancing domain knowledge bases through natural language text understanding. The acquisition of previously unknown concepts is based on the assessment of the \quality" of linguistic and conceptual evidence underlying the generation and reenement of concept hypotheses. Text understanding and concept learning are both grounded on a terminological knowledge representa...

متن کامل

Designing the internal evaluation indicators of educational planning in postgraduate program (input, process, outcome domains) in public health faculty. Isfahan

Introduction:This study has tried to design educational program indicators for internal evaluation of graduate courses in health school. Regarding the systemic approach in educational framework, the related indicators have been categorized in three groups as input, process and output. Method: First, indicators of graduate educational programs were defined based on different resources and scho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015